Hit rate

References

Bowers et al. "Prospective Hot-Spotting. The future of crime mapping?"
Mohler et al. "Self-Exciting Point Process Modeling of Crime"

Technique

Select a coverage level, say 10% (or 5%, or 20%). Select this percentage of the grid cells, from most risky to least risky, according to the prediction. In e.g. [1] this is refered to as forming the "hotspot". We then regard a future crime event as "captured" if it falls in the selected area. Calculate the percentage of events captured.

More formally, we issue predictions for times $t_1 < t_2 < \cdots < t_n$, typically using all available information before time $t_i$ when forming the prediction for time $t_i$. Each prediction is valid for a time period $s_i$; typically each $s_i=s$ (say, one day) and $t_{i+1} =t_i +s$, but this is not strictly necessary.

For each prediction at time $t_i$ we form the selected region (the "hotspot") and calculate the fraction of captured events, say $x_i \in [0,1]$. This yields a sequence $(x_i)$, and it is common to report a summary statistic of the $(x_i)$ (e.g. the mean, inter-quartile range, etc.)

Caveats

For a fixed study region, coverage is in direct proportion to area. However, when comparing different regions, or even potentially different grid sizes, it is more sensible to select on area. In [1], this idea of normalising by area is termed the "search efficiency rate".

Advantages

Easy to understand.
Easy to compute. It is computationally efficient to calculate a range of coverages at the same time.
Depends only on the relative risk, and hence might be better suited to some of the "a-theoretical" prediction methods.

Disadvantages

We might worry about the MAUP in that a crime event is either "captured" or not, regardless of how close or far it is from being in a covered cell. In our experiments, we have found that changing the grid positioning alone can lead to a noticably change in hit rate.
It is not clear which summary statistic(s) to report. The mean is commonly the only value used. [2] reports the Standard Error.
The major difference in performance between different prediction algorithms seems, in our experiments, to be due to qualitative changes in the shape of the "hot-spots". It is thus not clear if we are really comparing like with like (for example, from a policing operations perspective). For this reason, it is common to also report some summary statistic(s) about the shape of the hot-spots.



In [ ]: